String manipulation with stringr : : CHEAT SHEET 


The stringr package provides a set of internally consistent tools for working with character strings, i.e. sequences of characters surrounded by quotation marks. 


Detect Matches 


B TRUE 


TRUE 
FALSE 
D TRUE 


= 
v 


str_detect(string, pattern) Detect the 
presence of a pattern match in a string. 
str_detect/(fruit, "a" 


str_which(string, pattern) Find the indexes of 
strings that contain a pattern match. 
str_which(fruit, "a" 


str_count(string, pattern) Count the number 
of matches in a string. 
str_count(fruit, "a" 


str_locate(string, pattern) Locate the 
positions of pattern matches in a string. Also 
str_locate_all. str_locate(fruit, "a" 


Mutate Strings 
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str_sub() <- value. Replace substrings by 
identifying the substrings with str_sub() and 
assigning into the results. 

str_sub(fruit, 1, 3) <- "str" 


str_replace(string, pattern, replacement) 
Replace the first matched pattern in each 
string. str_replace(fruit, "a", "-") 


str_replace_all(string, pattern, 
replacement) Replace all matched patterns 
in each string. str_replace_all(fruit, "a", "-") 


str_to_lower(string, locale = ""en")1 Convert 
strings to lower case. 
str_to_lower(sentences) 


str_to_upper(string, locale = "en")1 Convert 
strings to upper case. 
str_to_upper(sentences) 


str_to_title(string, locale = "en")! Convert 
strings to title case. str_to_title(sentences) 


Subset Strings 
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str_sub(string, start = 1L, end = -1L) Extract 
substrings from a character vector. 
str_sub(fruit, 1, 3); str_sub(fruit, -2) 


str_subset(string, pattern) Return only the 
strings that contain a pattern match. 
str_subset(fruit, "b") 


str_extract(string, pattern) Return the first 
pattern match found in each string, as a vector. 
Also str_extract_all to return every pattern 
match. str_extract(fruit, "[aeiou]") 


str_match(string, pattern) Return the first 
pattern match found in each string, as a 
matrix with a column for each () group in 
pattern. Also str_match_all. 
str_match(sentences, "(a|the) ([* ]+)") 


Join and Split 
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str_c(...,sep = "", collapse = NULL) Join 
multiple strings into a single string. 
str_c(letters, LETTERS) 

str_c(...,sep = "", collapse = "") Collapse 
a vector of strings into a single string. 
str_c(letters, collapse = "") 


str_dup(string, times) Repeat strings times 
times. str_dup(fruit, times = 2) 


str_split_fixed(string, pattern, n) Split a 
vector of strings into a matrix of substrings 
(splitting at occurrences of a pattern match). 
Also str_split to return a list of substrings. 
str_split_fixed(fruit, " ", n=2) 
str_glue(...,.sep = "", .envir = parent.frame()) 
Create a string from strings and {expressions} 
to evaluate. str_glue("Pi is {pi}") 
str_glue_data(.x, ...,.sep = "", .envir= 
parent.frame(), .na = "NA") Use a data frame, 
list, or environment to create a string from 
strings and {expressions} to evaluate. 
str_glue_data(mtcars, "{rownames(mtcars)} 
has {hp} hp") 


Manage Lengths 
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str_length(string) The width of strings (i.e. 
number of code points, which generally equals 
the number of characters). str_length(fruit) 


str_pad(string, width, side = c("left", "right", 
"both"), pad =" ") Pad strings to constant 
width. str_pad(fruit, 17) 


str_trunc(string, width, side =c("right", "left", 
"center"), ellipsis ="...") Truncate the width of 
strings, replacing content with ellipsis. 
str_trunc(fruit, 3) 


str_trim(string, side = c("both", "left", "right")) 
Trim whitespace from the start and/or end of a 
string. str_trim/(fruit) 


Order Strings 
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Helpers 
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str_order(x, decreasing = FALSE, na_last = 
TRUE, locale = "en", numeric = FALSE, ...)1 Return 
the vector of indexes that sorts a character 
vector. x/str_order(x)] 


str_sort(x, decreasing = FALSE, na_last = TRUE, 
locale = "en", numeric = FALSE, ...)1 Sorta 
character vector. 

str_sort(x) 


str_conv(string, encoding) Override the 
encoding of a string. str_conv(fruit,"|SO-8859-1") 


str_view(string, pattern, match = NA) View 
HTML rendering of first regex match in each 
string. str_view(fruit, "[aeiou]") 


str_view_all(string, pattern, match = NA) View 
HTML rendering of all regex matches. 
str_view_all(fruit, "[aeiou]") 


str_wrap(string, width = 80, indent = 0, exdent 


= 0) Wrap strings into nicely formatted 
paragraphs. str_wrap(sentences, 20) 


1 See bit.ly/ISO639-1 for a complete list of locales. 
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Regular expressions, or regexps, are a concise language for 
describing patterns in strings. 


Need to Know Regular Expressions - 
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Pattern arguments in stringr are interpreted as MATCH CHARACTERS see <- function(rx) str_view_all("abc ABC 123\t.!?\\(){}\n") x) 0 natante stri aoli 
regular expressions after any special characters [:blank:] g 
have been parsed. string (type regexp matches example 
this) (to mean this) (which matches this) Space so 
In R, you write regular expressions as strings, a (etc.) a (etc.) see("a") abc ABC 123 |! tab 
sequences of characters surrounded by quotes \\. \. see("\\.") abc ABC 123 .!? 
("") or single quotes("). \\! \ | see("\\!") abc ABC 123 17 mee 
Some characters cannot be represented directly LV \? l see("\\?") abc ABC 123 .!: 
in an R string . These must be represented as \\\\ \\ \ see("\\\\") abc ABC 123 |!3 [:punct:] 
special characters, sequences of characters that \\( \( ( see("\\(") abc ABC 123 17 
have a specific meaning., e.g. \\) \) see("\\)") abc ABC 123 17 Trib jit 
Special Character Represents \\{ \{ { see("\\{") abc ABC 123 .|!7 TPTELPLPPPP Pieper 
\\ \ \\} \} } see( "\\}") abc ABC 123 .!7 
Ne i \\n \n new line (return) see("\\n") abc ABC 123 |!7 [:alnum:] 
\n new line \\t \t tab see("\\t") abc ABC 123 .!7 
Run ?"""" to see a complete list \\s \s any whitespace (\S for non-whitespaces) see("\\s") abc ABC 123 |! [:digit:] 
\\d \d any digit (\D for non-digits) see("\\d") abc ABC 123 |! 0123456789 
Because of this, whenever a \ appears in a regular \\w \w any word character AW for non-word chars) see("\\w") abc ABC 123 .|!7 
expression, you must write it as \\ in the string \\b \b word boundaries see("\\b") abc ABC 123 |! 
that represents the regular expression. [:digit:] 1 digits see("[:digit:]") abc ABC 123 17 [:alpha:] 
1 T 
Use writeLines() to see how R views your string [:alpha:] , letters see("[:alpha: l" abc ABC 123 .!? [:lower:] [:upper:] 
after all special characters have been parsed. [:lower:] lowercase letters see("[:lower:]' abc ABC 123 .!? 
ee [:upper:] uppercase letters see("[:upper:]' abc ABC 123 .!7 abcdef ABCDEF 
ed [:alnum:] letters and numbers see("[:alnum:]' abc ABC 123 .!7 ghijkl GHIJKL 
[:punct:] i punctuation see("[:punct:] abc ABC 123 .!7 
writeLines("\\ is a backslash") [:graph:] ' letters, numbers, and punctuation see("[:graph:] abc ABC 123 .!7 ee ie MNOPQR 
# \ is a backslash [:space:] _ space characters (i.e. \s) see("[:space:] abc ABC 123 .!7 stuvwx STUVWX 
[:blank:] space and tab (but not new line) see("[:blank:] abc ABC 123 |!7 7 7 
Wae PESE E EAEAN EE EEA E LEE EE every character except a new line see(".") abc ABC 123.7? 
1 Many base R functions require classes to be wrapped in a second set of [ ], e.g. [[:digit:]] 
Patterns in stringr are interpreted as regexs To 
Coa ee TMV o o occ ec cce cc ccccceccncecceceueecceecuescceccuecenesussceecsueeccecesesceeccsescceeeueeeeeccseseceecseseenccececsecesees  eueeueaeeeceueeeeecueeceecsueecceecsedceecssecceseueeececsseeeeceseseeecetedeeeceueeeeecsseeeceesseseeeccececeecueeeeeeees: 
ALTERNATES alt <- function(rx) str_view_all("abcde", rx) QUANTIFIERS quant <- function(rx) str_view_all(".a.aa.aaa", rx) 
regex(pattern, ignore_case = FALSE, multiline = I I 
FALSE, comments = FALSE, dotall = FALSE, ...) regexp matches aa ce a regexp matches lle tad 
Modifies a regex to ignore cases, match end of ab|d or alt("ab|d") abcde — a? zero or one quant("a?") .a.aa.aaa 
ings as well of neo eee allow ints [abe] one of alt("[abe]") abcde alata a* zero or more quant("a*") .a.aa.aaa 
A $1, Ebel to) MeIee VEIL Veto) ani [Aabe] anything but a ') abcde LEH We We at one or more quant("at+") .a.aa.aaa 
str_detect("I", regex("i", TRUE)) [a-c] range alt" iaci") abcde E 2 | veal it a afn} exactly n quant("a{2}") .a.aa.aaa 
nE a{n, } nor more quant("a{2,}") .a.aa.aaa 
fixed Matches ra b t b t ill Cr reape o o ëŤűCñŤėëė n a E satel E A A T E A A A AA mn A AN E E E E E A E A T zgj 1 = 
nee a that Pane e a ANCHORS anchor <- function(rx) str_view_all("aaa", rx) nH.-HmẸ..; afn, m} between n and m quant("a{2,4}") .a.aa.aaa 
ways (fast). str_detect("\u0130", fixed("i")) regexp matches example 
coll() Matches raw bytes and will use locale E-E-E- ^a start of string ERA T aaa GROUPS ref <- function(rx) str_view_all("abbaab", rx) 
specific collation rules to recognize characters eee hee a$ end of string anchor("a$") daa Use parentheses to set precedent (order of evaluation) and create groups 


that can be represented in multiple ways (slow). 


str_detect("\u0130', coll("i', TRUE, locale = tr") eűñfŤyyjlaaa SEE) mee d ae uae de" bcd 
look <- function (rx) str_view_all("bacad", rx) GL EEEE AEE alt("(ab|d)e") aCe 


boundary() Matches boundaries between 


harada litte ub rac ee dca tance commande: regexp matches example Use an escaped number to refer to and duplicate parentheses groups that occur 
str_split(sentences, boundary(“word")) a(?=c) followed by look("a(?=c)") hcad earlier in a pattern. Refer to each group by its order of appearance 
a(?!c) not followed by  look("a(?!c)") bacad string regexp matches example 
(?<=b)a preceded by look("(?<=b)a") bacad (type this) (to mean this) (which matches this) (the result is the same as ref("abba")) 
(?<!b)a not preceded by look("(?<!b)a") bacad \\1 \1 (etc.) first () group, etc. ref("(a)(b)\\2\\1") abbaab 
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